Asynchronous Stochastic Gradient Descent with Delay Compensation

نویسندگان

Shuxin Zheng

Qi Meng

Taifeng Wang

Wei Chen

Nenghai Yu

Zhiming Ma

Tie-Yan Liu

چکیده

With the fast development of deep learning, people have started to train very big neural networks using massive data. Asynchronous Stochastic Gradient Descent (ASGD) is widely used to fulfill this task, which, however, is known to suffer from the problem of delayed gradient. That is, when a local worker adds the gradient it calculates to the global model, the global model may have been updated by other workers and this gradient becomes “delayed”. We propose a novel technology to compensate this delay, so as to make the optimization behavior of ASGD closer to that of sequential SGD. This is done by leveraging Taylor expansion of the gradient function and efficient approximators to the Hessian matrix of the loss function. We call the corresponding new algorithm Delay Compensated ASGD (DC-ASGD). We evaluated the proposed algorithm on CIFAR-10 and ImageNet datasets, and experimental results demonstrate that DC-ASGD can outperform both synchronous SGD and ASGD, and nearly approaches the performance of sequential SGD.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation

Asynchronous algorithms have attracted much attention recently due to the crucial demands on solving large-scale optimization problems. However, the accelerated versions of asynchronous algorithms are rarely studied. In this paper, we propose the “momentum compensation” technique to accelerate asynchronous algorithms for convex problems. Specifically, we first accelerate the plain Asynchronous ...

متن کامل

Supplementary Material: Asynchronous Stochastic Gradient Descent with Delay Compensation

where Cij = 1 1+λ ( uiujβ lilj √ α ), C ′ ij = 1 (1+λ)α(lilj) , and the model converges to the optimal model, then the MSE of λG(wt) is smaller than the MSE of G(wt) in approximating Hessian H(wt). Proof: For simplicity, we abbreviate E(Y |x,w∗) as E, Gt as G(wt) and Ht as H(wt). First, we calculate the MSE of Gt, λGt to approximate Ht for each element of Gt. We denote the element in the i-th r...

متن کامل

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

Nowadays, asynchronous parallel algorithms have received much attention in the optimization field due to the crucial demands for modern large-scale optimization problems. However, most asynchronous algorithms focus on convex problems. Analysis on nonconvex problems is lacking. For the Asynchronous Stochastic Descent (ASGD) algorithm, the best result from (Lian et al., 2015) can only achieve an ...

متن کامل

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Stochastic Gradient Descent (SGD) is a fundamental algorithm in machine learning, representing the optimization backbone for training several classic models, from regression to neural networks. Given the recent practical focus on distributed machine learning, significant work has been dedicated to the convergence properties of this algorithm under the inconsistent and noisy updates arising from...

متن کامل

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

We provide the first theoretical analysis on the convergence rate of the asynchronous stochastic variance reduced gradient (SVRG) descent algorithm on nonconvex optimization. Recent studies have shown that the asynchronous stochastic gradient descent (SGD) based algorithms with variance reduction converge with a linear convergent rate on convex problems. However, there is no work to analyze asy...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Asynchronous Stochastic Gradient Descent with Delay Compensation

نویسندگان

چکیده

منابع مشابه

Accelerating Asynchronous Algorithms for Convex Optimization by Momentum Compensation

Supplementary Material: Asynchronous Stochastic Gradient Descent with Delay Compensation

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization

The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory

Asynchronous Stochastic Gradient Descent with Variance Reduction for Non-Convex Optimization

عنوان ژورنال:

اشتراک گذاری